Preprocess the corpus

First, we pre-process the Tweets to get it ready. This includes:

We then create a dictionary of all words occurring in Tweets assigned with an ID, and then convert tweets to bags of words.

Function for running the model. Can adjust model parameters.

Sample of 100k Tweets

Denial Tweets

All 2020 Tweets

Climate Tweets by Year, 2006 - 2020

Serialise the year groupings from dataset, so we don't have to reload the dataset again if we just need the years as keys.

Climate Denial Tweets 2006 - 2020

References: